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Abstract. We present a unified framework to deal with threshold functions for the existence 
of certain combinatorial structures in random sets. More precisely, let M ■ x = be a linear 
system of r equations and m variables, and A a random set on [n] where each element is 
chosen independently with the same probability. We show that, under certain conditions, 
there exists a threshold function for the property "A m contains a non-trivial solution of 
M ■ x = 0", depending only on r and m and, furthermore, we study the behavior of the 
limiting probability in the threshold scale in terms of volumes of certain convex polytopcs 
arising from the linear system under study. 

Our results cover several combinatorial families, namely sets without arithmetic progres- 
sions of given length, fc-sum-free sets, Bh[g] sequences and sets without Hilbert cubes of 
dimension k, among others. 

1. Introduction 

The existence of certain structures in large combinatorial systems plays a central role in 
discrete mathematics, and more specially in combinatorial number theory. In the context of 
extremal combinatorics this type of questions has provided an active area of research where 
many different techniques are used. In the general setting, finding extremal conditions is a 
difficult task and generally needs smart ad hoc arguments. One mayor example of this fact is the 
celebrated theorem of Szemeredi |Sze75j on the existence of long arithmetic progressions in sets 
of positive density. See also |Gow011 [Fur77j . Nevertheless, some results have been obtained by 
means of general arguments: in [Ruz93 ( Ruz95 upper and lower bounds for the size of maximal 
sets avoiding solutions to linear equations are obtained. 

The purpose of this paper is the study of the common behavior of a random set in terms of 
the existence (or non existence) of such structures. In this setting we can provide a clear picture 
of what is expected for most sets. This approach allows us to obtain results for a wide variety 
of structures. 

The models of random sets we consider in this work are the analogues of the Q(n,p) and 
Q(n,M) models in random graphs. In the first case, for a probability p (depending possibly on 
n) we consider the random set A provided that P (a G A) = p for every a £ [n] . In the former, we 
fix the number M of elements, and we consider the uniform distribution among the possible 
subsets of [n] with M elements. Despite the two models are not the same, they have similar 
asymptotic behavior when choosing p = ^- JLROO, Luc90 . For practical reasons we work with 
the first model, due to the independence on the choice of the elements. However, with high 
probability, the number of elements of such a random set will be close to np. 

Let P be a combinatorial property and A a random set in [n] . We write A |= P if A satisfies 
P. A property is said to be increasing if B \= P whenever A C B and A \= P. In this context, 
we say that t(n) is a threshold for the property P if 

(i) p = o(t(n)), implies limW(A (= P) — > 0, and 

(ii) t(n) = o(p), implies limP(.4 |= P) ->■ 1. 
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Observe that thresholds are not uniquely defined; but defined within constant factors. However, 
we are interested in the order of magnitude of this transition phase. 

The problem we address in this paper is the following one: consider the linear homogenous 
system of equations 

an%i + • • • + a\rX m = 

: , M = (ay) 6 M r , m {Z). 

k d\ r X\ ~\~ ' ' ' ~\~ O rnr X m 

For conciseness, we call it an (r, m) -system, with r < m. 

Let A C [n] be a random set, where every element is chosen with probability p. We study 
how the quantity \A m fl {x : M ■ x = 0} behaves with respect to p and deduce the existence of 
a threshold function for the combinatorial property Pm defined as "A m contains a non-trivial 
solution of M ■ x = " in terms of the expected value for this random variable. The existence 
of such a function is assured by the fact that monotone properties in random sets always have 
thresholds functions |BT87j . 

More precisely, we focus on systems of equations which are positive (there exist at least one 
solution whose coordinates are pairwise different positive integers) and non- degenerate (with 
maximum possible rank). Positivity is a natural condition and it is necessary to assure that the 
system has solutions in [n] m . We also demand that solutions are non-trivial. The importance of 
defining what a trivial solution is will be discussed later (see Section 0}. 

Roughly speaking, a non-degenerate (r, m)-positive system of equations has positive solutions 
without repeated components and cannot be reduced to another one with a smaller number of 
equations or variables. Under these assumptions, we have the following theorem: 

Theorem 1. Let r < m and M ■ x = be a positive non-degenerate (r,m) -system. Then, the 
probability p = n~~ x is a threshold function for the property Pm'- a A m contains a non-trivial 
solution of M ■ x = 0". 

In other words, whenever the size of A is o(n™) we can assure that asymptotically almost 
surely there are no other than trivial solutions of the linear system M ■ x = with x e A m . 
The main contribution in the study comes from those solutions whose components are pairwise 
distinct, since, roughly speaking, solutions with repeated components appear later in the regime. 

We can also study the behavior of the limiting probability in the threshold scale. With this 
purpose, observe that the system of equations M ■ x = and the restrictions on x define a 
non-empty, convex and rational polytope of dimension m — r. 

With this definition in mind, we show that there exists an exponential decay which depends 
on the volume of Vm and the number of variables involved in the system of equations, but not 
on the number of equations. More precisely, 

Theorem 2. For p = cn™^ 1 , 

lim F(A \= Pm) = 1-e 

n— >oo 

where Vm is the polytope associated to the system M ■ x = and [im is a computable constant 
which depends on the symmetries of M . 

In a general setting, the computation of the constant Vol (Vm) appearing in Theorem [2] is a 
difficult problem. The computation of such a volume could be obtained by means of triangulations 
of the polytope [DLRS10] , but the problem is in general computationally involved [BEFOOj . 

In this work we consider the precise analysis of interesting combinatorial families which fit 
into the presented scheme. More precisely, a set of integers is an arithmetic progression of length 
k (or shortly, a fc-AP) if it can be written in the form a, a + d, . . . , a + (fc — l)d for some a, d G Z 
and d ^ 0. 

A set of integers A is called a Sidon set (or ^[l] set) if every integer n has at most one 
representation as a sum of two elements of A. One can generalize this concept in several ways; 
for example a set A of non- negative integers is a Bh [g] set if every integer has at most g repre- 
sentations as a sum of h elements of A, modulo permutations of the summands involved. 
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Another possible generalization are the so called Hilbert cubes: a set H of integers is a Hilbcrt 
cube of dimension fc (or fc-cube) if there exists positive integers ho, hi, . . . , hk satisfying 



H = <h 



k 

£ 

i=l 



tih % : e % £ {0, 1} 



Clearly a set A is Sidon if it does not contain any 2-cube. As it is shown by Sandor |San07| 

almost all sets in [n] with size n 5* contain a fc-cube for every e > 0. 

A set A contains a k-bary centric sequence if there exist oi, . . . , flfe, dk+i £ A such that 

ai+a 2 A + a k = kdk+i, 

that is ttfe+i is the average of ai, . . . , a^. Clearly if fc = 2 that is a 3-AP and trivial solutions 
are given by a± = • ■ • = du+i- Finally, a set of integers A is a k-sum-free set if for every pair 
a, a' £ A the sum a + a' is not an element of {ka : a £ A}. 

The existence of such structures could be codified using systems of equations of the type 
M ■ x = for matrices M £ Ai r<m (Z). A set A avoids a fc-AP if the homogeneous system 
Mk-AP • x = does not have a non-trivial solution x = (x\, . . . , Xk-i) £ A , where 



M, 



k-AP = 



1-2 1 
1-2 1 



£ M fe _2, fc (Z). 



In this case all trivial solutions (see Definition [3} are given by x\ — x 2 — 
correspond to the case d — 0. A set A is a Sidon set if there are no solutions x 
A 4 of the linear system x\ + x 2 



■ — Xk £ A and 

(xi, x 2 ,x 3 ,Xi) £ 

X4,, except from the trivial ones, which have the form 
either (a,b,a,b), (a,b,b,a) for a,b £ [n] . Similarly, a set A is Bh[g] if there are no solutions in 
j\h(g+i) f i mear system defined by 



•i'3 



(1) 



M 



B h [g] 



1 -1 
1 



-1 

1 -1 



-1 



£ M 



g,h(g+l) 



;z). 



i ••• i -i -i . 

A set A avoids 3-Hilbert cubes if it does not contain solutions to 

-li 1-1 



3-H = 



1 

-1 



1 -1 

-1 1 1-1 
-1 1 



i 

-110 

and in general for a fc-Hilbert cube we will have a (2 k — (fc + 1), 2 fc )-system. 

Finally, a set A is k-sum-free if there are no solutions x = (xi,X2,X3) £ A 3 of the linear 
system x± + x 2 — kx^, (when fc = 2 we do not accept the the trivial ones x 2 = 0,xi = £3). It 
is clear from the definition of Mk-AP and M Bh t g ^ and the k-sum-free family that all matrices 
have maximum rank, that is r = fc — 2 and r = g and r = 1, respectively. The application of the 
previous theorems and the computations in Section [B] give the following table: 



m 


P 


E[\A\] 


k 


„-*/* 


n l-2/k 


4 


n -3/4 




h(g + 1) 


9 1 




nM.?+i) 


n h(g+l) 




fc+i 


1 k+1 


2 fe 


n 2 fc 


n 2 k 


3 


„-2/3 


n l/3 


k+1 


n -k/k+l 


n l/k+l 



Vo\(V_ 



M ) 



k-AP 
Sidon 

B h [g] 

fc — cube 
fc — sum — free 
fc — bariccntric 



fc-2 
1 

g 

- (fc + 
1 
1 



2(fe-l) 
2 
3 

Section [5] (g 

2 k 
(fc+l)!fe! 
1 
k 
1 



1 

8 

l)!(/ l !)f+ 1 
2 k 

2 
fc! 



Table 1. Threshold for different combinatorial families. 



Nevertheless, let us note that this general approach allows us to study all linear structures at 
once but some aspects of this problem are hard to generalize. For example, it is clear by the 
arithmetic definition of a fc — AP or a Bh [g] set what a trivial solution should be, but in general 
it is not so obvious and we will explain how this should be considered. On the other hand, if we 
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estimate solutions to this linear systems we will overcount solutions arising from the symmetries of 
the problem; for example whenever we have a solution to x\+X2 + - ■ ■+xt l = Xf l +i+Xf l +2 + - • ■+%2h 
we immediately have 2(h\) 2 other solutions obtained after permuting the summands of every 
representation and reversing the order of the representations. Therefore we must consider the 
set of solutions modulo permutations. As we will see in the proof of Theorem [T] it suffices to 
study in detail those solutions whose components are pairwise different and, in this case, it is 
easy to compute the number of permutations between them. We include this computation on 
the previous table and denote the number of such symmetries by /im- 

State of the art. In the presented approach, we intended to give a picture of qualitative behavior 
of a random set. However one might wonder how far is the common situation from the extremal 
cases. The problem of estimating the size of maximal sets avoiding these structures has been 
intensively studied. In this direction one can find several results which give upper bounds for 
sets avoiding an specific structure or, on the opposite direction, explicit constructions of large 
sets with this property. In both cases one requires ad hoc arguments that strongly depend on 
each specific problem. 

For sets avoiding fc— AP's we must go back to Szemeredi's Theorem, that states that no 
set with positive density can avoid fc— AP's for any k. In particular, for k = 3 non-trivial 
bounds were firstly obtained by Roth |Rot52j and then refined by several authors, see HB87, 
IBou08| . Nowadays, the best upper bound is established by Sanders jSanllj . On the other hand, 
Berhend [Beh46j constructed a set avoiding 3- AP's of large size; this construction was slightly 
improved by Elkin [Elkllj (see also |GW10j ). More precisely, we have 

n ■ « max{|„4| : A avoids 3-AP's } « „ ■ (1 °f° S " )5 , 

e c vi°g™ Ac[n] logn 

for some constant c. 

Concerning the general fc-AP problem, analogous bounds have been obtained: briefly, the 
upper bounds come from the pioneering work of Gowers |Gow01j and, more recently, dense 
constructions that lead to lower bounds for this problem were stablished by O 'Bryant [O'Bllj . 
These results can be summarized as follows 

(logn) (21ogfe)_1 „- 2 ( fc +9) 
n ■ =— < max{U| : A avoids fc-AP's } < n ■ (log logn) 1 , 

for a certain constant c(fc) only depending on k. 

We show that almost all sets with size n 1 - 2 /^ 6 contain fc-AP's, for every e > 0. Observe that, 
for fc = 3 the gap between the usual situation and the extremal set is very large: most sets with 
size n 1 / 3+e contain 3-AP's but there are examples of (almost) linear size avoiding this structure. 
Nevertheless, as fc grows to infinity, this quantity approximates to n and the gap between the 
exponents tends to 0. 

The study of Sidon sets dates back to Erdos. In |ET41] Erdos and Turan obtained an upper 
bound for the size of a maximal Sidon set in [n] (see |Lin691 ICillOj for further improvements 
of this result). In fact, there are algebraic constructions of Sidon sets that, combined with 
Erdos- Turan result, provide 

max {\A\ : A is Sidon } ~ n 1/2 . 

Ac[n] 

In the direction of the present article, the Bh[l] case was studied in detail by Godbole, Janson, 
Loncatore and Rapoport in [GJLR99] . They show that almost no set with 

n l/2h+e is for 

every e > 0. Clearly, for h — 2 (that is Sidon), the gap between the exponents in the usual 
situation, namely |„4| = o(n 1 / 4 ) and extremal one, say |*4.| = ri 1 / 2 , is very big. 

Concerning the general case, it is known that the cardinality of a maximal Bh [g] set in [n] is 
x n 1 ' h , but the main difficulty is to obtain a precise constant for the problem |CRT02i ICRVlOj . 
As we show in Theorem [T] almost all sets in [n] of size (n 1 / h - 1 / ht -9+ 1 )) are B] r [g}. Once again, 
if we fix h and let g grow to infinity both situations approach each other. 
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Hilbert originally proved that any finite coloring of the positive integers contains a monochro- 
matic fc-cube. The density version of this result is known as Szemeredi's Cube Lemma and it is 
a key point in his proof of Roth's Theorem. Gunderson and Rddl [GR98] obtained, by counting 

arguments, that for sufficiently large n, any set A 6 [n] with size 2n 1 ~ ^ k ~ 1 contains a fc-cube. 

i i 

On the other side, by means of probabilistic arguments, one can construct a set of size n 2k - 1 
avoiding fc-cubes. For the particular case fc = 3, Cilleruelo [Cil] claims to have found an algebraic 
construction of a set of size x n 2 / 3 avoiding 3-cubes. 

As in the previous cases, when fc grows the existing gap between the exponents in our result 
and the ones in the upper and lower bounds tends to 0. 

The question of maximizing the cardinality of a set of integers in [n] avoiding x + y — z 
belongs to the folklore: one cannot select more than \^~\ integers satisfying this condition and 
this is optimal. The case k = 2 coincides with the exclusion of 3-AP's. Concerning k = 3, the 
problem was solved by Chung and Goldwasser [CG96a] getting the same estimates as for k = 1. 
For k > 4, and sufficiently large n, Chung and Goldwasser [CG96b discovered fc-sum-free sets of 
linear size in n (and density tending to 1 as k increases); in fact Baltz, Hegarty, Knape, Larsson 
and Schoen BHK+05] showed that this construction is optimal. Therefore, for this family it is 



known that the maximal size of a fc-sum-free set is linear in n but Theorem [T] asserts that almost 
all sets of size n 1 ^ 3+e contain at least one solution to x + y = kz, for every fc and positive e. 
Observe that in this family, the parameter fc does not play a role in the position of the threshold. 

Plan of the paper: In Section [3] we introduce the precise notation we use in this paper and 
prove a useful counting lemma. In Section U we prove Theorem [TJ and in Section [S] we study the 
local behavior of this threshold. The analysis of VoI(Vm) associated to combinatorial families is 
carried out in Section [BJ Finally, in Section [7] we discuss related problems and generalizations. 

2. Tools 

In this section we recall the second moment method and Janson's inequality in the context of 
the probabilistic method, as well as basic notions in Ehrhart's Theory for counting lattice points 
in convex polytopes. 



2.1. The second moment method. The second moment method is used in the version given by 
Corollary 4.3.4. of Alon, Spencer |AS08j : let A = Ij + • • -+I S be a sum of s independent indicator 
random variables, where Ij is associated to a certain event (namely, P(Ij = 1) = P(Ei), 
P(Ij =0) = 1 — F(Ei)). We write that i ~ j if i ^ j and the events Ei and Ej are not 
independent. Define 

(2) A = £>(.E ( A£ i ) 

Under these conditions, if E[A] — > oo and A = o (e,[X] 2 ^J (as s — > oo), then X ~ E[A] asymptot- 
ically almost surely. In particular, under these assumptions, X > with probability tending to 1. 



2.2. Janson's Inequality. The form we apply Janson's inequality is the one in Theorem 8.1.1. 
of Alon, Spencer |AS08j : let {Ei}i & j be a set of events. Assume that there exists e > such 
that for all i G J, P (E,) < e. Then 

(3) n p (^) ^ p (a^) <e^n p (^)' 

iei Kiel I iei 

where A is the expression defined in Equation ©. 
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2.3. Lattice points in dilates of polytopes: Ehrhart's Theory. A basic reference for 
definitions and first properties of convex polytopes is Zie95 . For further results in lattice points 
in rational polytopes, see |BR07[ IDL05] . 

A convex polytope is the convex hull of a finite set of points (which are always bounded) , or a 
bounded intersection of a finite set of half-spaces, and is said to be rational (respect to integral) 
if its vertices are points with rational (resp. integral) coordinates. Every rational polytope has 
a representation the form {x £ M. d : M ■ x = b}, A £ _Md, m (Z), b £ Z d for certain non- negative 
integers m,d (note that its dimension is not necessarily d, but a smaller non- negative integer). 
For a given polytope V, let Vol('P) be the volume of V and n ■ V — {np : p £ V} the nth-dilate 
of the polytope. 

Ehrhart Theorem |Ehr62j (see also |Mac63p gives a precise description of the number of integer 
points on the nth-dilate of a rational polytpe in this context: the quantity |n • V f~l Z^" 1 ^] is 
given by a pseudopolynomial in n of degree dim(7- > ) (recall that a pseudopolynomial is a function 
f( n ) — 12i=o c i( n ) nl where the functions Co(n), . . . , Cd(n) are periodic). A convex polytope is 
said to be integral (resp. rational) if its vertices have integer (resp. rational) coordinates. In 
this context, we have the following generalization of the previous result 

Theorem 3 (Ehrhart's Theorem). Let V be a d- dimensional convex polytope. 

L- If ' V is an integral polytope, then |n • V n Z d | is a polynomial in n of degree d. 
ii.- If V is a rational polytope, then \n-VC\'L d \ is a pseudopolynomial in n of degree d. 
Additionally, its period divides the least common multiple of the denominators of the 
coordinates of the vertices ofV. 

Additionally to Theorem [31 the leading coefficient in both cases is equal to Vol('P). As a 
trivial corollary, for a rational polytope V of dimension dim('P) embedded in R dlm ^\ 

n-Vn Z dim ^> = Vol(7>)n diro ^ (1 + o(l)). 

3. Notation and a lemma 

Recall that for r < m a system of r linear equations in m variables is called a (r, m) -system. 
A system of linear equations is positive if there exists one solution whose coefficients are pairwise 
different positive integers. A positive (r, m)-system is non-degenerate if it has the maximum 
possible rank, that is r. 

The key point on this analysis is to correctly define what a trivial solution is. Observe that 
in some of the problems discussed before it was very clear how trivial solutions look like. For 
example, trivial solutions to fc-AP's are given by a\ = ■ ■ ■ = ak and thus any nonempty set 
contains such structure. In order to study the threshold we must avoid these kind of degenerate 
cases and understand what it means for the general setting. 

Let us discuss these two points in a precise way. Let M-x = be a (r, m)-system and associate 
to the variable Xi its corresponding index i. Let p be a set partition of the set [m] into s blocks. 
Observe that p defines a new system of equations M' • x' = obtained after taking the initial 
system M ■ x = and matching the variables of each block of p. In particular, if the number of 
variables of M' • x' = is equal to s we say that the (r 1 , s)-linear system of equations M' ■ x' = 
is subordinate to M ■ x = 0. 

Subordinate systems encode solutions of the initial system with, at least, two repeated com- 
ponents. For a given solution x of the system M ■ x = 0, we denote by p(x) the corresponding 
set partition of the indices [m]. We also write |p(x)| for the number of blocks in the partition. 
In particular, if x is a solution with pairwise different components, then p(x) = {{1}, . . . , {m}} 
and |p(x)| = m. 

Observe that not every partition p will come from a solution x, or reversely not every subordi- 
nate system will have solutions with pairwise different components. For example, if one considers 
the equation x\ +X2 = X3 + X4 it is clear that the related partition p = {{1}, {2, 3}, {4}} (that is 
X2 = X3) necessarily implies x\ = £4, and thus the subordinate system will no longer be positive 
nor non-degenerate. This observation is crucial in order to define what a trivial solution will be. 
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Definition 4. We say that x is a trivial solution to M ■ x = if the rank of the subordinate 
system related to p(x) is strictly smaller than the rank of M. 

In particular we do not study those solutions with = s < r. This notion generalizes the 

definition of trivial solution in the case r — 1 introduced in [Ruz93] . Roughly speaking, whenever 
the rank of the matrix decreases we loose information arising from (at least) one of the initial 
equations and that means that we are no longer dealing with the same arithmetic structures. 

We will now discuss some examples to motivate this definition. In Sidon sets, which are 
defined by the equation x\ + x 2 = £3 + 24, we have subordinate systems like x\ + 22 = 2^3 
(namely 23 = X4 in the initial system) that give arise to nontrivial solutions since the rank of the 
resulting system is 1. However, as we said before, if one considers the partition x\ = £3, £2 = ^4 
then the resulting system has zero rank and thus all solutions of this kind are trivial. 

Let us consider Bh [g] sets and discuss how trivial solutions look like. Recall that a set A is no 
longer a Bh[g] set if there exists g + 1 (essentially different) representations of the same element 
as sums of h elements of A. That is, there exists elements at G A with 

ai H 1- ah = a/i+i + • • • + a 2 h = ■ ■ ■ = dhg+i H + a-h(g+i)i 

and all representations are pairwise different, that is: none of them is obtained after permuting 
the h elements of another representation. Let us focus on B^[2] sets, for example, to illustrate 
what situations could be found. Here, we must avoid solutions to 

x\ + Xi + £3 = X4 + x 5 + x e = x 7 + x$ + .x 9 , 

and we are excluding situations like x% = x±, X2 — x$, X3 — xq (that reduces the rank to 1) but 
not X\ — X3 = £5 = X7, X4 — x$ which is still a valid solution, 

JIT _ ( 1 1 1 -1 -1 -1 \ Xl=X3=Xji=X 7 / 1 1 _1 _1 \ Z4=£C8 H 1-1-1 \ 

IV1 B 3 [2] — 1,000 1 1 1 -1 -1 -1 / f \ 1 1 -1 -1 I f V 1 -1 ) ) 

since the resultant subordinate has rank two. As we have seen in the Sidon case, representations 
with repeated components are valid, and if ft > 3 we also consider representations that have 
some elements in common but not all at once. 

The main contribution to the analysis of the threshold will come from solutions with pairwise 
different components. The number of such solutions is, nevertheless, easier to count than the 
number of solutions with repeated components (as we are dealing with general systems). The 
main difficulty is to prove that the contribution of those solutions with repeated components is 
negligible. 

We associate to the (r, m)-system M ■ x = the coordinate expression 

o-uxi + • • • + a\ r x m = 

(4) < : M = (aij) e Mf, m (Z). 

k &\ r x\ -\- • • • a mr x m 

Observe that the system M ■ x = 0, < x±, . . . ,x m < 1, defines a rational polytope Vm of 
dimension m — r (as we are assuming that the system has the maximum possible rank and the 
polytope it is not empty by the positive assumption). 

The following lemma will be applied in the forthcoming sections and simplify the discussion: 

Lemma 5. Let r < m and M ■ x = be a non- degenerate positive (r,m)-linear system. Then 
the number of solutions x £ [n] m of M ■ x — with pairwise different coordinates is of the form 
Vol (Vm) n m ~ r (l + o(l)) ; where Vm is the rational polytope defined by M . 

Proof: The number of lattice points in n ■ Vm is precisely the number of solutions of M ■ x = 
such that x G [n] m , which is Vol (V M ) n m - r {l + o(l)). 

We continue considering the set of solutions x G [n] m of M ■ x = with some repeated 
component. These solutions could be codified as solutions of systems which are subordinated to 
M ■ x = 0. As in the initial case, a fixed subordinate system defines a polytope with dimension 
strictly smaller than m~r (in fact, this polytope could be obtained as the intersection of Vm with 
a proper subspace). Hence the number of positive solutions of a system subordinate to M ■ x = 
whose components are bounded by n is 0(n m ~ r ~ 1 ). Finally, the number of subordinate systems 
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is bounded by the number of partitions of {1, . . . , m}, hence the total number of solutions with 
repeated components is o{n m ~ r ) and the lemma follows. □ 

We now consider for a given non-degenerate positive (r, m)-linear system and a given in- 
teger 1 < t < m — 1, the set of pairs of solutions to M ■ x = in [n] m with exactly t co- 
ordinates in common; that is, we want to count the number of pairs (x, y) of solutions with 
\{xi,...,x m ,yi,.. .,y m }\ =2m-t. 

Corollary 6. Let r < m, M ■ x = be a non-degenerate positive (r,m)-linear system and 
1 < t < m— 1 a fixed integer. Then the number of pairs {x, y) of solutions x 1 y £ [n] m of M-x = 
with pairwise different coordinates andt coincidences is O (n 2m ~ 2r_ *) 7 and the constants involved 
only depend on t and M . 

Proof: Observe that for a fixed set of t coincidences, say {x^ — yj 1 , . . . , Xi t — yi t }, the number 
of pairs (x, y) correspond to solutions of a new (2r, 2m — t)-linear system by considering the 
natural subordinate system to 

/ i\/r n \ / \ 

0, 

obtained by matching those coordinates that coincide. It is clear that this is, in fact, a non- 
degenerate linear system, and therefore by the previous lemma it has 0(n 2m ~ 2r ~ t ) solutions. 

Observe that the number of possible sets of t coincidences is precisely (™) , which concludes 
the proof. □ 



f M 


) 




\ o 


M ) 


■ ; 



4. Proof of Theorem \T\ 

Let x, y be two solutions of the system. We say that x = y if x is obtained after permuting the 
coordinates of y. Let Sm = {x G [n] m : M ■ x = 0} be the set of non-trivial solutions (possibly 
with repeated components) of the (r, m)-system M ■ x = modulo permutations. Thus we will 
only count essentially different solutions. 

Observe that not every permutation of a solution will be a solution. For example, if one 
considers x + y = z with solution (x, y, z) = (a, b, c), it is clear that (b, a, c) is also a solution, but 
(c, a, b) is not. 

For x £ Sm, denote by E x the event x £ A m , and I x the corresponding indicator random 
variable. Then it is clear that P(£ , x ) = pl p ( x )l, where p(x) is the set partition associated to x. 
Observe that if x £3 y then P (E x \E y ) = P (E y \E x ) = 1, hence E x = E y . 

Let X denote the random variable that counts the number of different solutions of (U]) whose 
components lay in A m , that is the number of classes in Sm- We can express X as 

(5) X = E l ~ 

xGSm 

Therefore by splitting the sum in ((5]) in terms of the size of the corresponding set partition we 
have 

m 

(6) e[x]= Yl p (^x) = E J2 p s 

x e S M s=r x 6 Sm 

|p(x)| = s 

The quantity E[X] can be estimated by analyzing the sum in (0 and showing that the main 
contribution arises from the term s = m (corresponding to solutions with pairwise different 
coordinates). By Lemma [5] this contribution to E[X] is equal to 

J2 P(£x)= J2 P m = ^f 1 n m - r p m (l + o(l)), 

x e S M x e S M 

|p(x)| = m |p(x)| = m 
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where /iM denotes the number of solutions on each class x £ Sm with |p(x)| = m, or equivalently 
the number of permutations between solutions of pairwise different coordinates. Observe that 
the number of representatives on each class of Sm will depend on the associated partition p(x), 
but not only on its size |p(x)| = s. Nevertheless, since for s = m there is a unique possible 
partition, the number of representatives on each class of solutions is exactly /ij/. 

Consider now a fixed set partition p = p(x), for some x £ Sm, of size r < s < m. It defines a 
subordinate (r, s)-system and consequently by Lemma [5] 

J2 V s = (n-y) . 

x e S M 
P(x) = p 

Summing now over all set partitions of size smaller than m we obtain the estimate 

m 

™ = E E p s 

s=r x G S M 
|p(x)| = S 

Observe now that if p = o (n™ ~ 1 ) we have that E[X] = o(l), as n — > oo. That is, X = almost 
asymptotically surely in this range, and we have shown the first part of the theorem. 

For bigger values of p, say n~~ x — o(p), we have that E[X] — > oo as n — > oo. We must 
study carefully the second moment of the variable X in order to conclude that X > almost 
asymptotically surely in this range of p. With this purpose, let us study the quantity A defined 
in Equation @. 

For two elements x, y we have that x^y and E x and E y are not independent if and only if 
< |{xi, . . . , x m } (~l {yi, ... , Um}\ — t < m, that is they have exactly t coincidences. Under these 
hypothesis and following the notation of Subsection we write x ~ y. It follows from Corollary \6\ 
that, if we restrict ourselves to solutions with pairwise distinct coordinates, the number of pairs 
of such solutions with t coincidences is O (n 2m ~ 2r - t ^ . This result can be extended to general 
solutions, with possible repeated coordinates, as we have seen earlier. 

Therefore, 

m— 1 

A = ^P( J B x A J E y )= J2 E F (^xA£ y ) 

x~y t=l |xHy|=i 

rn— 1 

« n 2m - 2r -y m -* 
t=i 

(7) = n 2m ~ 2 > 2m ({npy 1 + (np)~ 2 + ■■■ + (np)- m+1 ) = o (e[X] 2 ) , 

since P (E x A E y ) — ■p 2m ~ t ) m > 1, and clearly (np)^ 1 — o(l) in this range for p. 
We can conclude that X ~ E[X] asymptotically almost surely. In particular we have that, in 
this range, X > with probability tending to 1. 

5. Proof of Theorem [2] 
Observe that in the range p = cn™^ 1 , it follows from Equation ([7]) that 

A = O (n 2m ~ 2r p 2m ({np)- 1 + ■■■ + (np)- s )) = O (V r / m ) = o(l). 

Hence, by Janson's Inequality (0) we have that 

n p(^)=p( a £x) (i+ (i))- 



= V °^ M) Tl m - r p m (l + 0(l)) 
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We continue estimating this quantity, by splitting the general contribution among solutions with 
pairwise different components and the rest. The first contribution is by Lemma [S] 

n p(^)=(i-/») z2 S Min "^ (i+o(i)) 

X g S M 

|p(x)| = m 

/ ,\ lu n , Vol(-p M ) m 

(8) = (l - c m n^ m -^) (1 + o(l)) *i=ty e ^ c . 

Let us analyze the former contribution. For a fixed partition p of [to] into s blocks we have that 
the number of x £ Sm with p(x) = p is 0(n s ~ r ). Therefore we get 

_ / a \ 0(n"- r ) 

o) n p (^=( i -^)° (n ^ ) =( 1 -^^) (i+ (D)™i. 

p(x) = p 

since clearly s — s(r/m) > s — r. Therefore, combining Equations © and (0) we have 
lim P (A h Pm) = 1 - lim TT P (ID = 1 - , 



v°'(y M ) . 
e mm 

n— s-oo 

x 6 Sj 



and the result follows. 



6. The computation of Vo\(Vm) 

In this section we consider the question of computing the constants VoI('Pm) involved in both 
Theorem 1 and 2. As we have shown in previous sections, the constant VoI('Pm) is the volume of 
the polytope defined by the equations M ■ x = 0, where the components of the vector x belong 
to the closed interval [0,1]. 

We study the fc-sum free sets as a warm up. Note that the fc-barycentric case could be treated 
with the same ideas. Secondly we analyze Ehrhart's Polynomial for the polytope associated to 
fc-AP's by means of elementary arguments. For the Bh[g] family, we obtain an exact formula by 
means of Vandermonde's determinants. Finally, the volume in the case of /c-cubes is not analyzed 
here, but observe that the volume can be deduced in this case from the results of |San07] . 

6.1. fc-sum- free sets. As a toy example, let us compute the volume of the polytope associated 
to sum-free sets, obtained from the linear equation x\ + a; 2 = £3, < Xi < 1. The associated 
polytope can be defined as follows 

Vi-SF = {(xi,x 3 ) :0<ii<i 3 <l}cl 2 , 

since x-i — X3 — x\ € [0, 1] for any (2:1,2:3) £ Vi-sf- Clearly V\sf is an integral polytope, since 
it is in fact the triangle with vertices (0,0), (0,1) and (1,1), and an easy computation gives a 
volume equal to 4. 

However, let us obtain this value by means of interpolation arguments. It follows from 
Ehrhart's Theorem that n • Vi-sf H Z 2 | = f(n) for a polynomial / of degree dim^Pt-sp) = 2; 
namely f(n) = Yo\{V\-sF)n 2 + bn + c. It is clear that /(0) = |{(0,0)}| = 1 (which gives 
c = 1), /(l) - /(0) + |{(0,1),(1,1)}| = 3 (thus b - 2 - VoI(TVsf)) and /(2) = /(l) + 
|{(0,2),(1,2),(2,2)}| = 6. Therefore 

/(2) = 4Vo1(7Vsf) + 2b + c= 2Vol(V 1 - SF ) + 5 = 6=^ Vol(7V SF ) = i 

as we wanted to show. 

The case k > 1 is slightly different: here we consider the set 

V k -SF = {(xux 3 ) : 0<fcc 3 -a?i<l, < zi, £ 3 < 1} C R 2 , 

which is a parallelogram instead of a triangle. Its area is equal to j. The main difference is 
also that in the first case we obtain a polynomial, despite in the second case we may obtain a 
pseudopolynomial. 
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We continue computing V61(Vm) m the case of fc-AP's and also for Bh[g] sets. In the first 
case by elementary means we obtain the closed expression for the volume. In the former case, 
we apply interpolation arguments to obtain general expression in terms of determinants. 

6.2. fc-AP free sets. This family has been studied widely. For instance, the same result we 
obtain is implicitly stated in |San07j . For completeness we include the analysis here. As we have 
seen earlier, a fc-AP 

xi = a, X2 = a + d, X3 = a + 2d, . . . , Xk = a + (fc — l)d 

can be expressed as a solution x = (xi, . . . ,Xk) of a linear system of rank k — 2 in fc variables. 
We can count the number of fc-AP with elements in [n] U {0} by direct counting: 

Proposition 7. For any integer fc > 3 the number of k—AP (including trivial ones) in [n] U {0} 
is given by 

2 



(n + 1) 



fc - 1 



1 - 



fc- 1 



fc - 1 



fc- 1 



Proof. Observe that any fc— AP is of the form {a, a + d, . . . , a + (fc — l)d} where a G [n] U {0} 
and d € {0, 1, 2, . . . , LfttJ}; since 

0, b£iJ,%^J,.-.,(*-i)L]frJ<« 

is a fc— AP and the equality holds for multiples of fc — 1. Additionally, for a given d we have that 

{0,eJ,...,(fc-l)d)}, {l,l + d,...,l + (fc-l)d)}, {n-(k-l)d,n-(k-2)d...,n} 

are the only fc— AP with common difference d. 
Thus the total number of fc— AP is given by 



(n+ 1 - (fc - l)d) = (n+ 1) 



d=0 



fc - 1 



fc - 1 



fc - 1 



□ 



Corollary 8. The polytope associated to a k—AP family has volume 2 (k-i 



Proof. Let Vk denote the associated polytope. By Ehrhart's Theorem it follows that the number 
of fc-AP in [n] U {0} is equal toVol^fc)™ 2 + 0(n), and it follows from Proposition [7] that this 
quantity is given by 



(n + 1) 



fc- 1 



1 



fc- 1 



1 



2(fc - 1 



-n 2 + 0{n). 



□ 



6.3. The Bh[g] family. A polytope with unimodular matrix (namely, each quadrangular sub- 
matrix has determinant either or ±1) is integral |Sch86j . We start proving that the polytope 
associated to the Bh[g] is integral, hence we can use the usual interpolation technique in poly- 
nomials. 

Proposition 9. The polytope associated to the Bh[g\ family is integral. 
Proof. We study the minors of the matrix expression given by 



V M = {x : M ■ x < 0} n {x : (-M) • x < 0} n [0, l] m C M ; 
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where M denotes the matrix that appears in Equation ([1}. It is obvious that we only need to 
prove that all minors of the matrix 



/ 1 A 
A 
A 



(10) 



V 



-1 A -1 
1 A 1 
A 



A 
-1 A -1 

1 A 1 



\ 



/ 



belong to the set {0,±1}. Observe that we can reduce our argument to minors with entries in 
the top part of the matrix (namely, the top part defined by the horizontal line in Matrix (|10p). 
We argue by induction on the size of the minor. The result is clear for minors of size 1, as the 
entries of the matrix belong to {0, ±1}. Assume that the result is true for every minor of size 
smaller than k, and let us show that the result is also true for k. With this purpose we use the 
fact that every column of Matrix (jTUJ) ) has at most two elements different from 0. 

Consider the first row of the minor under study. If all elements are equal to 0, the minor is 
equal to 0. If there exist a unique element different from 0, we apply induction by developing 
the determinant along the row. Finally, let us assume that there exist in the first row at least 
two elements different from 0. Finally, observe that: 

1. - if these two elements in the first row are equal the corresponding columns are linearly 

dependent, and the determinant is equal to 0. 

2. - if these two elements are different, the column where 1 belongs just contain 0: by con- 

struction a minor cannot have a —1 below a 1 entry. Hence we can develop the determi- 
nant by this column and we apply induction. 



With this analysis we cover all possible cases, and the proof is finished. 



□ 



We continue computing the number of solutions of the system of equations M Bh t g ] • x = 0, 
such that the components of x belong to [n] U {0}. We proceed by direct counting, applying the 
inclusion-exclusion method. Writing k = k\n + ki (where < k\ < h — 1 and 1 < ki < n, or 
k\ = 0, &2 = 0), the number of solutions of the equation x\ + • • • + Xh = k with Xi £ [n] U {0} is 
equal to 

ki 

v~v_ -^j f h \ ( ( fc i _ .i) n +fa-j+h-l 

3=0 



jj\ h-1 



Observe that k is smaller or equal than hn. Consequently, the total number of integer points 
in the polytope defined by the equations Mg h r s i • x = and each component of x belonging to 
[n] U {0} is equal to a function 



fc 1= 0fe 2 = l \j=o 



a 



Now the argument used in the case of fc-AP does not work, as expressions are more involved. 
However, be can apply an interpolation argument to obtain the dominant term of fh,g(n): by 
Proposition [9] and Theorem [3l fh,g{ n ) is a polynomial of degree hg — g + 1. Hence, the values 
//t,g(0), fh,g(l)t * ■ * , fh,g(hg — g) determine completely fh, g (n). In particular, the volume of the 
polytope is the coefficient of the term n h9 ~ 9+1 in fh,g(n) is written by means of Vandermonde's 
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determinants in the following way: 
(11) 



1 
1 1 
1 2 



1 




1 



fh,g(0) 
h,g{2) 



1 

1 1 

1 2 

1 hg-1 




1 




1 

2 h 9-9 



(hg - g) 



hg—g—l 



(hg 



hg~g ••• (hg-g)^-"- 1 hjhg-g) 

We recall that a detailed study for the threshold in S/,[2] sets can be found in GJLR99 . 
In this work Godbole et al. studied the random variable that counts the number of solutions 
(a, b) = (oi, . . . , ah, &i, . . . , bh) of the equation 

(12) ai+a 2 -\ \-a h = bi + b 2 Vb h , 

with a\ < a 2 < ■ ■ ■ < a-hi b\ < b 2 < ■ ■ ■ < bh and a < b with respect to the lexicographic 
order. They obtained the volume of the associated polytope by means of trigonometric sums and 
Fourier analytic methods. More precisely, this volume is given by Equation (16) in GJLR99 : 

j2(-iy( 2h )(h-jr-\ 



i 



Kh = 



2(/i!) 2 (2/i- 1)! 



hg-g 



As we have seen before, it suffices to study carefully the number of solutions with pairwise 
different components. Therefore, in terms of our approach, this result can be translated into 

Vol(P Bh[ i]) = 2{h\) 2 n h = ^ Bh [g] R h, 

since for every ordered solution to (fT2"j) we must count 2(h\) 2 different solutions (obtained by 
permuting the <Zj and the bj components, and then considering the symmetric solution (b,a)). 

These constants correspond to the first column in the following table (g = 1). Values of the 
volume of the polytope with h,g < 6 are computed in Table [5] using Equation (TTTj) . Closed 
formulas for bigger values of g seem to be much more involved. 



h\g 



1 


2 


3 


4 


5 


2 


1 


'2 


l 


'2 


3 


2 


5 


3 


7 


11 


12 


379 


565 


6759 


20 


35 


1680 


3696 


64064 


151 


1979 


40853 


200267 


825643615 


315 


7560 


270270 


2223936 


15084957888 


15619 


4393189 


1865002207 


342366164065 


689860777579903 


36288 


20756736 


16937496576 


5792623828992 


21316855690690560 


655177 


45515121 


1549892743123 


1931111804640401 


31400953991819767493 


1663200 


256256000 


18284797440000 


46260537523200000 


1497176036400844800000 



Table 2. Volumes for different families of Bh[g] sets. 



7. Related questions 

The problem considered in this paper could be rephrased in a more general setting. Let Q an 
infinite sequence of integers. Let A a random set in [n], and M-x = 0a non-degenerate positive 
(r, m)-system. Does there exist a threshold function for the property " A m contains a non-trivial 
solution x with M ■ x € Q r "? Observe that this paper has dealt with the case Q = {0}. It is 
clear that we need extra assumptions on the the matrix M: for instance, the system of equations 
with matrix 

M =(l 2 

is positive and non-degenerate, but M ■ x E Q 2 when Q = 2N + 1 is not possible. The problem 
of characterizing those matrices which are admissible for a given sequence Q or, on the contrary, 
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characterizing those sequences that are admissible for a fixed system is far from being trivial. 
Nevertheless, even for very simple systems, think of x± — X2, and well studied sequences, like the 
squares or the primes, the study of large sets which avoid this condition is very involved. 

For example, Sarkozy |Sar78] showed that every set with positive upper density contains at 
least two elements whose difference is a square, see also |Lya| . It is, in fact, conjectured that the 
for every e > there exists a set in [n] whose differences are never a square and has size n 1 ~ c . 
Ruzsa [?] proved this conjecture for every e > 0.267. 

In the presented approach, however, some things can be said. For example, consider the 
equation X\ — x-i and the sequence of fc-th powers Q = {x k :i£f)} (the same arguments could 
be applied to more general sequences, like prime numbers or powers of 2 among others). 

Then, it is obvious that, if we denote by Sq(ti) = {x = (x\, x%) G [n] 2 : x\ — x-i G Q} the set 
of solutions, 




by Abel's summation formula. 

It is easy to see that if .A a random set of [n], where every element is chosen uniformly at 
random with probability p, thenp = nr^~ is a threshold function for the property a x\— xi G Q". 
The proof follows the same ideas of Theorem[TJ Once again, if we denote by E x the event x G A 2 
and I x be the associated indicator random variable. It is clear that the expected value for the 
random variable 

x= **> 

xGS e (ri) 

is O (n^p 2 ^. Hence, taking p = o (n^^^j this expected value tends to 0. 
For the second part, we observe that 

A = O (»|Q(n)|V) = Oin^p 3 ) 

and therefore taking p ^> n"^r we obtain that A = o(E[X] 2 ). Consequently, X ~ E[X] 
asymptotically almost surely. 

The methodology developed to deal with systems of linear equations could be adapted to treat 
similar problems in other directions. The same arguments could be adapted in the context of 
finite fields: despite the extra conditions we need to demand to the system (in order to have 
maximum rank), we do not need an Ehrhart's type result in this context. 

Acknowledgments: the authors thank Arnau Padrol, Vincent Pilaud, Lluis Vena and Carlos 
Vinuesa for suggestions and fruitful discussions, and Javier Cilleruelo for a detailed reading of 
the manuscript and support. 
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